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Abstract 

Detecting gene-gene interaction in complex diseases is a major challenge for common disease genetics. Most interaction 
detection approaches use disease-marker associations and such methods have low power and unknown reliability in real 
data. We developed and tested a powerful linkage-analysis-based gene-gene interaction detection strategy based on 
conditioning the family data on a known disease-causing allele or disease-associated marker allele. We computer-generated 
multipoint linkage data for a disease caused by two epistatically interacting loci (A and B). We examined several two-locus 
epistatic inheritance models: dominant-dominant, dominant-recessive, recessive-dominant, recessive-recessive. At one of 
the loci (A), there was a known disease-related allele. We stratified the family data on the presence of this allele, eliminating 
family members who were without it. This elimination step has the effect of raising the "penetrance" at the second locus (B). 
We then calculated the lod score at the second locus (B) and compared the pre- and post-stratification lod scores at B. A 
positive difference indicated interaction. We also examined if it was possible to detect interaction with locus B based on a 
disease-marker association (instead of an identified disease allele) at locus A. We also tested whether the presence of 
genetic heterogeneity would generate false positive evidence of interaction. The power to detect interaction for a known 
disease allele was 60-90%. The probability of false positives, based on heterogeneity, was low. Decreasing linkage 
disequilibrium between the disease and marker at locus A decreased the likelihood of detecting interaction. The allele 
frequency of the associated marker made little difference to the power. 
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Introduction 

It is clear that the expression of common disease depends on the 
interaction of multiple loci. Over the past decade, the technique of 
choice for identifying disease loci has been association analysis, 
particularly Genome Wide Association Studies (GWAS). These 
studies, which usually involve thousands, if not tens of thousands of 
subjects, assume that finding highly significant statistical differ- 
ences between marker allele frequencies in case and control 
populations would guarantee that a gene strongly influencing 
disease expression would be discovered. However, the disease 
relative risk seen for most of these associated genes was so low as to 
call into question the utility of further pursuing these genes. 
Furthermore, although gene-gene interaction is viewed as critical 
for understanding common disease expression, the statistical tests 
used to detect gene-gene interaction from association data are 
weak. 

There are several current techniques to detect interaction from 
association data. One chooses two known (or hypothesized) alleles 
or genotypes at two loci and applies a regression model that 
includes main effects and interaction terms and then tests if the 
interaction terms are zero. There are variations that use logistical 



regression or multinomial regression on case-control data, case- 
only data, or family-based association data. One could conceivably 
do a pair-wise test of all SNPs in a GWAS to search for an 
association but achieving a result with enough statistical signifi- 
cance to survive a correction for the number of tests would be 
difficult. The rational alternative is to choose loci that show 
association by themselves and test for interaction. There are also 
data-mining approaches and methods that look for all combina- 
tions of alleles that appear to influence disease expression [1]. 
However, there are disadvantages to all these approaches. 

1 . The power to detect association is reduced in the face of allelic 
heterogeneity. Thus, the existence of multiple alleles at a locus 
with differing influences on disease, affect association detection. 
If one adds to that varying amounts of linkage disequilibrium 
between disease allele and marker SNP [2] and differing 
interactions with alleles at other loci, interaction detection 
becomes more difficult. 

2. Some approaches would require an enormous sample size and 
overwhelming statistical significance to correct for the number 
of tests [1]. 
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3. The biological significance of a positive result is not 
immediately clear. Given that most GWAS-discovered associ- 
ations have low relative risks, it is unclear how much more 
evidence for risk could be detected by looking for interaction. If 
an exhaustive pair-wise search approach is used, is unclear 
what detecting an interaction involving two alleles at different 
loci would mean in the absence of evidence of association 
between the disease and each allele independently. 

Combined with the low relative risks, the association analysis- 
based approach to detecting gene-gene interaction has disadvan- 
tages. Given the enormous investment necessary to determine if 
there is a biological basis for the statistical observation, a more 
robust approach that is more likely to ensure biological 
significance is needed. 

Linkage analysis identifies gene locations based on family data. 
It has the advantage of using inheritance information rather than 
gene frequency differences between case versus control popula- 
tions. Linkage analysis can best detect loci that have a major effect 
on disease expression, a characteristic that can be viewed as either 
an advantage or a disadvantage, but, given the state of our 
knowledge about common disease, we consider this as a distinct 
advantage. 

We recently showed how linkage analysis could be used to both 
prove the existence of gene-gene interaction and uncover 
additional loci that contribute to disease [3]. We successfully 
showed evidence for a second locus that interacts with BMPR2 
locus mutations to contribute to the expression of Familial 
Pulmonary Arterial Hypertension (FPAH). The BMPR2 muta- 
tions, although unequivocally involved in FPAH expression, have 
a low penetrance, an observation that, in the absence of any 
identifiable environmental influence, can only be explained by 
gene-gene interaction [3]. 

Our approach to detecting interaction stratifies the linkage data 
on alleles at a locus known to cause, or be associated with, the 
disease. In this work, we use extensive computer simulation to 
explore the statistical characteristics of using linkage stratification 
to learn the strengths and the limitations, power and efficiency of 
the method so that it can be a useful and a well-understood tool to 
identify gene-gene interaction. 

The linkage-based method we present appears quite robust. 
Linkage is insensitive to allelic heterogeneity, so the presence of 
multiple alleles at a locus does not affect power. The finding of a 
causative or associated allele under a linkage peak can be used to 
enhance the detection of other linked loci that interact with that 
allele. 

Methods 

Epistatic Genetic Models 

In an epistatically interacting two-locus disease inheritance 
model, both loci are necessary for disease expression, i.e., the 
disease genotype at both loci must be present in order for a subject 
to be affected. Depending on the gene frequencies of the disease 
alleles at the two loci, the "apparent" penetrance of the disease will 
vary, viewed as a single locus model based on either one of the loci 
alone, as will the expected proportion of affected siblings (the 
ascertainment-corrected segregation ratio) [4]. What has tradi- 
tionally been interpreted as "reduced penetrance" is based on the 
assumption of a single-locus model. In a two-locus epistatic model, 
the so-called reduced penetrance at a locus, which is a 
characteristic ascribed to the disease when assuming a single- 
locus model of inheritance, is caused by the second locus, even 
though the penetrance of the disease genotype, which by definition 



includes both loci, is full. The basis of our approach is, in effect, to 
make the model a one-locus model by ensuring almost everyone in 
the dataset has the disease genotype at the first (known) locus. 
Then, the presence of the disease alleles at the other locus will 
determine affectedness status, creating a single-locus model. 

The model we studied is an epistatic two-locus system requiring 
disease genotypes at both loci. At one locus (locus A), a known 
disease-causing allele is segregating. We first linkage analyze the 
data at locus B (the test locus) assuming the correct parameters, 
including penetrance (the reduced penetrance being caused by the 
effect of locus A), and record the lod score. We then stratify the 
data by including only carriers of the disease allele or genotype at 
the known locus A. Then, we perform a second linkage analysis at 
locus B (which, in an analysis using real data, might include all 
markers in the rest of the genome). By stratifying the data so that 
subjects without the disease allele at locus A are excluded, 
everyone remaining (except necessary connecting subjects) is a 
carrier of the disease allele or genotype at A. Thus, we effectively 
raise the penetrance at the second, interacting locus (B) because 
non-carriers of the disease allele at A cannot be affected and, by 
eliminating them, we are excluding "non-penetrant" carriers of 
the disease allele at locus B. Thus, after stratification, all subjects 
will have the disease genotype at locus A (except perhaps parents) 
and only those who are affected will have the disease genotype at 
B. 

The nature of the interaction we studied was always a two-locus 
epistatic model, in which both loci together are causative. 
However, we also examined the case where the allele leading to 
exclusion of subjects was not the causative allele at locus A but an 
allele in linkage disequilibrium (LD) with the causative allele and 
thus was associated with the disease (see below). In that case, also, 
we include in the stratified (or pruned) analysis only those offspring 
carrying the associated allele and who are thus more likely to carry 
the disease allele at a known locus (and their parents) by dint of 
that association. 

The test statistic we devised compared the evidence for linkage 
at B after pruning with the evidence for linkage at B using the 
unpruned data. We tested some variation from the above ideal 
epistatic condition that may model other realistic situations. 

Simulation parameters 

We generated multipoint linkage data for a disease 
caused by two unlinked, epistatically-interacting loci (A and B) 
using a modification of the program Caleb (http://potato. 
nationwidechildrens.org/ caleb) [5] called Xcaleb. Caleb generates 
multipoint family data in which two disease loci interact to 
produce the disease and for which the details of the interaction can 
be specified. In addition to the two disease loci, the program lets 
the user specify up to 18 single nucleotide polymorphism (SNP) 
marker loci of arbitrary gene frequency and also allows 
specification of pair-wise linkage disequilibrium (LD) between 
loci. In all our simulations, we fixed the first disease locus (locus A) 
at position 5 and the second disease locus (locus B) at position 1 3 
on the simulated "chromosome" (see Figure 1). These loci were 
not linked to each other (recombination fraction (9) = 0.5). We 
then calculated the multipoint lod scores and specifically focused 
on the lod score at the test locus B. 

The genetic distances between loci (9) used to generate the 
linkage data are shown in Figure 1. Some of the loci between 
disease locus A and disease locus B were separated by 
recombination fractions that ensured A and B were unlinked. 
Recombination fractions between the other markers surrounding 
the disease loci were fixed at 0.001. 
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Figure 1. Simulation map. All recombination fractions, except those labeled, were set to 0.001. 
doi:10.1371/journal.pone.0093398.g001 



The LD measure, D', between the disease allele at locus A and 
marker allele 1 at locus 4 was set to 1 . The gene frequency of the 
disease allele matched that of allele 1 (e.g., r = 1); thus, allele 1 at 
the marker always occurred together with the disease allele at locus 
A and never with the normal allele. The recombination fraction 
between locus A and the locus 4 marker was set to 0 (see Figure 1). 
There was no LD, and thus no disease-marker association, 
between other marker alleles. 

Each experiment consisted of 500 datasets of 50 families each. 
The ascertainment criteria were set as follows: all simulated 
families were two-generation pedigrees and were required to have 
at least 1 affected offspring and a minimum of 4 members in the 
pedigree (i.e., a two-child nuclear family). The maximum family 
size consisted of two parents and 10 offspring. The family size 
distribution was based on Cavalli-Sforza and Bodmer [6]. 

We tested four different fully penetrant epistatic two-locus 
disease models ("fully penetrant" meaning that the disease 
genotype at both loci must be present for disease expression). 
The models were: a dominant-dominant model (DD), a dominant- 
recessive model (DR), a recessive-dominant model (RD) and a 
recessive-recessive model (RR) [4,7]. The disease-causing geno- 
types are (capital letters designate disease alleles): 

DD model: AABB, AABb, AaBB, AaBb; 

DR model: AABB, AaBB; 

RD model: AABB, AABb; 

RR model: AABB. 

When the disease model at a locus was dominant, the disease 
allele frequency was fixed at 0.1; when the model was recessive, 
the frequency was set to 0.2. The marker loci allele frequencies 
were all set to 0.5. 

Pruning/Stratification 

Each simulated family was analyzed twice. In the first analysis, 
evidence for linkage at locus B was calculated using all families and 
family members. We refer to this as the "unstratified" analysis. For 
the second analysis only the offspring carrying the disease allele at 
locus A were included. (Parents were also included, irrespective of 
their genotype.) We refer to this as the "stratified" (or "pruned") 
analysis. 

We illustrate the system with a example using the DD model: 
Figure 2 shows one of the simulated families when a DD model 
was specified. The letters under each pedigree member are the 
genotypes at the two epistatic loci. (Capital letters designate disease 
alleles.) There are three affected subjects: the mother (id 2) and two 
children (id 4 and id 6). In the first analysis, all subjects were 
included in the linkage computation. In the second analysis we 
eliminated offspring 5, 7 and 8, and re-calculated the lod score. 

This procedure means that there were fewer subjects in the 
"stratified" analysis than in the "unstratified" analysis. Also, the 
number of families in the "stratified" analysis could be lower than 
the number of families in the "unstratified" analysis because 



removal of non-carrier offspring can turn families with two 
offspring into uninformative trios. 

Analysis parameters 

The correct (generating) map (see Figure 1) was used in the 
analysis. The analysis penetrance was set to 0.5 for each analyzed 
model [8]. The GENEHUNTER program [9,10], was used to 
compute the multipoint lod scores. 

In the analysis, the assumed disease allele frequencies and mode 
of inheritance were always matched to the generating values for 
locus B. 

Test statistic 

In order to devise a test to determine if we can detect interaction 
between loci A and B and define its reliability, we compared the 
lod score at locus B calculated under the unstratified analysis to the 
one obtained after pruning (recall, pruning occurs at locus A). For 
both unstratified and stratified analyses, we calculated the 
maximum lod score in the linkage interval containing locus B 
for each dataset. We then calculated the lod score difference 
between the stratified and unstratified analysis for each dataset. 
We also determined the means over all data sets for a given model. 

The test statistic is based on these differences and can be written 

as 

INT = (maxlodscore| stratiftedanalysis) 
— (maxlodscore| unstratifiedanalysis) . 

The INT statistic can be interpreted as follows: 

• If the stratified and the unstratified lod scores are both positive: 
A positive INT ocean when the lod score at locus B is found to be 

higher after pruning (i.e. when non-carriers of the disease allele at 
locus A are removed) and indicates evidence of interaction. The 
lod score can increase only if locus A and B interact (which we 
show below when we observe how the INT behaves when 
heterogeneity, rather than epistasis, is simulated). By including 
only carriers of the disease allele at locus A, we are effectively 
raising the penetrance at locus B by eliminating those subjects who 
do not have the disease genotype at locus A and therefore cannot 
be affected with the disease if it arises from an interaction between 
A and B. If a negative or null i/VT is obtained, it suggests that there is 
no interaction between the loci because the apparent penetrance 
at locus B remains the same or decreases, indicating that 
eliminating non-disease genotypes at locus A has no effect on 
the linkage signal (or a negative effect) at locus B. 

• Similarly, if the unstratified lod score is negative and the 
stratified lod score is positive: 
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Figure 2. Example of simulated family under a DD model. Capital letters designate disease alleles. 
doi:1 0.1 371 /journal.pone.0093398.g002 



This would mean that conditioning on carriers of the disease 
genotype at locus A increases the evidence for linkage at locus B. 
Thus, raising the penetrance at locus B (by eliminating family 
members incapables of being affected because they do not have 
the disease genotype at locus A, but who have the disease genotype 
at locus B) increases locus B's detectability. This happened in 
Rodriguez-Murillo et al. [3]. 

• If the unstratified lod score is positive and the stratified lod 
score is negative: 
This would yield only negative values for the INT statistics, 
indicating that conditioning on carriers of the locus A disease allele 
gives us evidence against linkage while the unstratified analysis 
produces evidence in favor of linkage. This suggests that the 
stratification process has eliminated evidence in favor of linkage at 
locus B, suggesting there is no relationship between the loci. 

Heterogeneity 

A linkage analysis can produce evidence of multiple loci. These 
signals, if not occurring by chance, can be due either to 
epistatically interacting loci or to loci that independently produce 
the phenotype, i.e., genetic heterogeneity. To test whether the 
mere existence of multiple, but non-interacting loci could lead to 
false indications of interaction, we tested stratification when the 
disease phenotype is independendy caused by two different loci. 
For this scenario, we also simulated and analyzed heterogeneity 
models analogous to the epistatic models, designated: D+D, D+R, 
R+D and R+R with the same parameters as described above. The 
simulated and analysis penetrances for the two independent loci 
were both set to 0.5. 

Associated allele 

Above, we illustrated the situation in which there is a known 
disease allele at locus A, i.e., the ideal case. We now ask: can the 
existence of a disease-marker association, rather than an actual 
disease-causing mutation at locus A, be used to identify 
interaction? We explored how strong the association between 
the disease allele and the associated marker allele needs to be in 
order to demonstrate interaction between the loci using the 
stratification approach. We tested three scenarios: 

1 . We observed the effect of changing the strength of the 
association between the disease-associated allele (at locus A) 
and the marker allele at marker locus 4 (see Figure 1) by 
changing the LD between those two alleles. We varied the LD, 
examining the D' values: 0, 0.2, 0.4, 0.6, 0.8 and 0.9. 



2. We tested the situation in which the D' is fixed at a value of 1 
but the allele frequency of the associated marker allele is varied. 
We tested allele frequencies of 0.1, 0.2, 0.3, 0.4 and 0.5. 

3. We varied both D' and the associated marker allele frequency 
simultaneously. 

The above disease-marker association scenarios weaken the 
association between the associated allele and the locus A disease 
allele compared to the first analysis in which the LD between the 
marker and the disease allele was unity. Consequently, the 
presence of the associated allele would not always signal the 
presence of the disease allele at locus A. Accordingly, when the 
data are stratified, the evidence for interaction would be weakened 
because in some carriers of the locus 4 marker, that allele would 
not be syntenic with the disease allele at A. Thus, the non-carriers 
of disease locus at A would be less likely to be eliminated. 

Results 

1. Pruning based on the presence of the disease allele 

a) Effect of pruning on sample size. When the inheritance 
model at locus A is dominant, the reduction in the number of 
families was approximately 27% (29% for the DD model and 24% 
for the DR model) and the reduction in the number of subjects was 
approximately 18% (20% for DD model and 16% for DR model). 
When locus A was recessive, the number of families after 
stratification was only 53% of the number in the unstratified 
analysis (55% for the RD model and 5 1 % for the RR model). The 
number of subjects reduced to about 37% of the original number 
when locus A was recessive (34% for RD model and 39% for RR 
model) (Table 1). 

When larger datasets with fewer replicates were simulated (100 
dataset/100 families), we observed the same percentage reduction 
in both the numbers of families and subjects (data not shown). 

b) Effect of pruning on lod scores. Although stratification 
decreases dataset size both in families and subjects, if interaction 
exists, the lod scores generally increase after stratification, making 
the INT positive. Figure 3 shows the distribution of the INT for 
the four inheritance models when the unstratified lod score 
reached a value of at least 1.5. We chose a cutoff of 1.5 after we 
observed that the average increase in lod score with pruning was 
about 1 .0. Thus, after pruning, the lod score for the dataset would, 
on the average, be 2.5 and thus usually be highly suggestive of 
linkage at a genome-wide level [11]. In almost 80% of the cases, 
the lod score at locus B increased after pruning (i.e., INT>0) 
(Table 2). 
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Figure 3. Distribution of INT statistic when unstratified lod score was at least 1.5 units. Vertical dotted lines emphasize the boundary 
between negative and positive INT values. 
doi:1 0.1 371 /journal.pone.0093398.g003 



c) INT statistic. The INT statistic was positive in almost 
82% of the cases when locus A was dominant and 84% when it 
was recessive (when the unstratified lod score was equal or greater 
than 1.5). The most power to detect interaction was obtained 
under the RR model: 89% of the datasets produced a positive INT 
statistic. 

The mean increases in the lod score for the different models 
after pruning when the unstratified lod score at locus B was greater 
or equal than 1.5 units were: DD: 0.9; DR: 1.0; RD: 0.7; RR: 1.3. 

d) Choosing data sets based on the unstratified lod 
score. If we consider datasets that showed any positive value 
of the unstratified lod score at locus B (as opposed to only 
considering unstratified lod score values & 1 .5), the mean increases 
in the lod scores after pruning were: DD: 0.9; DR: 1.0; RD: 1.1; 
RR: 1.5. (Table 2). We emphasize that this increase in information 
for linkage occurred despite the stratification-caused decrease in 
both the number of families included in the analysis and in the 
total number of family members. 

e) Increased linkage evidence using pruning. If we look 
only at datasets that had an unstratified lod score between 1 .5 and 
3 at locus B (i.e., positive but not statistically significant linkage 
evidence), whether the lod score rose to statistical significance 
depended on the underlying model. For the DD model, 95% of 
such datasets achieved a lod score greater than 3 after 
stratification. The percentages for the others models were: 100% 
of the DR datasets, 65% of the RD datasets and 87% of the RR 
datasets. Thus, pruning can also aid in confirming linkage in cases 
where the lod score is positive but not statistically significant (data 
not shown). 



Pruning also could reveal evidence in favor of linkage at locus B 
that could not be seen otherwise. 71 (14%) of the RD datasets and 
35 (7%) of the RR datasets showed a negative lod score at locus B 
under the unstratified analysis. After stratification, only 1 dataset 
(for both models) continued to show a negative lod score, while the 
mean lod score was 1.8 for the RD model and 2.4 for the RR 
model. Furthermore most of those datasets showing evidence 
against linkage at locus B increased the lod score to at least 1.5 
after stratification (45 of the RD datasets (out of the 7 1 showing 
negative lod scores) and 25 (out of 35) of the RR datasets). For the 
DD and DR models, no dataset produced a negative unstratified 
lod score (Table 1). 

2. There was no false evidence for interaction under 
heterogeneity 

We tested whether there could be evidence for interaction (a 
positive INT) when interaction is not present, i.e., when the same 
phenotype can be caused independently by two different loci 
(genetic heterogeneity). 

The reduction in the number of families and the number of 
subjects after stratification was much greater under heterogeneity 
than under epistasis. The number of families declined by 52%- 
88% and the number of subjects by about 47%-90%, with 
decreases in sample size when locus A was recessively inherited 
more pronounced than under dominant models (Table 1). 

Under heterogeneity, a positive unstratified lod score might be 
anticipated (the loci are, in fact, linked to markers), but the 
presence of heterogeneous data will diminish the lod score. (These 
calculations did not use the heterogeneity lod score (HLOD).) 
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Figure 4. Lod score values varying the D' value between marker 4 and disease locus A. Unstratified lod score is represented by solid line 
and stratified lod score by dashed line. Horizontal dotted line identifies the threshold for significant evidence of linkage. 
doi:10.1371/journal.pone.0093398.g004 



When we looked at datasets under the D+R model, the lod score 
at locus B (the recessive locus under this model), was never positive 
before stratification and never became positive when stratified. 
Under the other models, 3% of the datasets had positive INT 
values (Table 2). When the unstratified D+D model lod score at 
locus B was positive, the mean unstratified locus B lod score was 
1.5. After stratification that value decreased to -0.5. For the R+D 
model, the mean locus B lod score decreased from 2.9 to -0.3 after 
stratification and for the R+R model from 1.8 to -1.4. When the 
unstratified lod score at locus B was greater than 1.5, the INT 
statistic was never positive when heterogeneity existed (Table 2). 

Thus, when heterogeneity exists, not only does stratification not 
increase the evidence for linkage at locus B, as it does when there is 
interaction, it causes a loss of linkage evidence. 



3. Interaction with an associated marker allele rather than 
the locus A disease allele 

a) The effect of D' between the associated allele and the 
locus A disease allele. We examined how the strength of 
association affects the INT statistic by varying the value of D' 
between the associated marker 4 allele and the disease allele at 
locus A. 

We tested D' values of 0, 0.2, 0.4, 0.6, 0.8 and 0.9. In all these 
simulations, the associated allele frequency was matched to the 
locus A disease allele frequencies (0.1 for dominant and 0.2 for 
recessive). 

Figure 4 shows each model's mean lod score values for the 
unstratified analysis (solid line) and the stratified analysis (dashed 
line) at locus B, plotted against the D' values. When D'<0.8, there 
was no increase in the lod score after stratification for the RR and 



Table 3. Percentage of datasets in which INT statistic is positive when unstratified lod score was positive. 





D' value 


Model 








DD 


DR 


RD 


RR 


0.8 


41.60 


31.20 


50.58 


49.68 


0.9 


58.00 


53.00 


57.83 


67.05 


1 


82.20 


82.60 


86.25 


90.32 


doi:1 0.1 371 /journal.pone.0093398.t003 
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RD models. There was no increase for the DD and DR models, 
when D' is less than about 0.85. 

Table 3 shows the percentage of datasets in which the INT 
statistic was positive when the unstratified lod score was positive. 
For the RD and RR models, a D' of at least 0.8 is needed in order 
to have 50% power to detect interaction (INT >0). For the DD 
and DR models the D' value needed to be at least 0.9 to reach 
50% power. 

The findings when D' is unity (i.e., the associated allele is the 
disease allele at locus A) (part 2, above) showed there is greater 
than 80% power to detect gene-gene interaction. As D' decreases, 
the power drops precipitously. With a D' of 0.9, the power is 59% 
if the unstratified lod score is positive (Table 3). 

Similar to the lack of any false positives when heterogeneity, 
rather than epistasis, existed, here were also no false positives 
under any D' values, although much more testing is necessary to 
arrive at a reliable type 1 error probability. We conclude that 
testing for gene-gene interaction using the proposed stratification 
procedure is unlikely to produce a false positive indication of gene- 
gene interaction. 

b) The effect of varying the associated marker allele 
frequency. We also explored the effect on lod score varying the 
associated allele gene frequency together with D'. We tested 
marker allele frequencies of 0.1, 0.2, 0.3, 0.4 and 0.5. (Recall the 
associated allele frequency was 0.1 when locus A was dominant 
and was 0.2 when locus A was recessive.) 

We found little dependence of the INT on the allele frequency 
at D' values of 0.8-1.0 (data not shown). 

Discussion 

We have demonstrated that stratification is a useful strategy to 
detect epistatic gene-gene interaction in complex disorders. It is 
necessary to have identified a disease-causing or a disease- 
associated allele at a locus linked to the disease in order to test a 
possible interaction of that locus with another. While many 
disease-causing or -associated alleles are known, to our knowledge, 
there has been no linkage-based method to test for interactions of 
that known disease locus with other loci. In the current work, we 
have thoroughly explored the behavior of the stratification 
approach when applied to certain epistatic inheritance models. 
We have shown through extensive computer simulation tests that 
conditioning on a known disease-causing or disease-associated 
allele can demonstrate the existence of gene-gene interaction or 
confirm the existence of a second epistatically-interacting locus. 

The proposed INT statistic indicated evidence of interaction up 
to 89% of the time (considering a starting lod score of 1.5 units) 
when interaction did exist, depending on the modes of inheritance 
of the two loci, among other factors. We anticipated that a major 
confounder in applying this method, as it is in all genetic analysis, 
was likely to be heterogeneity because multiple loci could be 
detected using linkage but for which no interaction would exist. 
We used the existence of heterogeneity as a basis for estimating the 
frequency of false positive evidence of interaction. To our surprise, 
we could detect essentially no false positives under heterogeneity. 
Nonetheless, heterogeneity is likely to be a confounder in real data 
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